Semiautomatic Extension of CoreNet using a Bootstrapping Mechanism on Corpus-based Co-occurrences

نویسندگان

Christian Biemann

Saim Shin

Key-Sun Choi

چکیده

The paper describes a language-independent approach for semiautomatic extension of lexical-semantic word nets and evaluates the method on CoreNet, the Korean version of word net. In a bootstrapping fashion, the socalled ‘Pendulum Algorithm’ operates on word sets obtained by co-occurrence statistics on a large un-annotated corpus and keeps error propagation low by a verification step. Results are not sufficient for automatic extension, but provide a good candidate set. Further improvements are discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Korean-Japanese-Chinese Aligned Wordnet with Shared Semantic Hierarchy

A Korean-Japanese-Chinese aligned wordnet, “CoreNet” is introduced. For the purpose of this paper, the term “wordnet” refers to a network of words. It is constructed based on a shared semantic hierarchy that is originated from NTT Goidaikei (Lexical Hierarchical System). Korean wordnet was constructed through the semantic category assignment to every meaning of Korean words in a dictionary. Ver...

متن کامل

Semantic Bootstrapping with a Cluster-Based Extension to DIPRE

The practical applications of information extraction are currently limited by the need to hand-construct search patterns and lexicons and / or to have available large labelled training sets. To address this issue, we present a semantic bootstrapping technique based on Brin’s DIPRE algorithm. The basic algorithm is extended by using clustering to group similar occurrences when extracting new pat...

متن کامل

Computer Assisted Semantic Annotation in the DutchSemCor Project

The goal of this paper is to describe the annotation protocols and the Semantic Annotation Tool (SAT) used in the DutchSemCor project. The DutchSemCor project is aiming at aligning the Cornetto lexical database with the Dutch language corpus SoNaR. 250K corpus occurrences of the 3,000 most frequent and most ambiguous Dutch nouns, adjectives and verbs are being annotated manually using the SAT. ...

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

Image Steganalysis Based on Co-Occurrences of Integer Wavelet Coefficients

We present a steganalysis scheme for LSB matching steganography based on feature vectors extracted from integer wavelet transform (IWT). In integer wavelet decomposition of an image, the coefficients will be integer, so we can calculate co-occurrence matrix of them without rounding the coefficients. Before calculation of co-occurrence matrices, we clip some of the most significant bitplanes of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Semiautomatic Extension of CoreNet using a Bootstrapping Mechanism on Corpus-based Co-occurrences

نویسندگان

چکیده

منابع مشابه

A Korean-Japanese-Chinese Aligned Wordnet with Shared Semantic Hierarchy

Semantic Bootstrapping with a Cluster-Based Extension to DIPRE

Computer Assisted Semantic Annotation in the DutchSemCor Project

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Image Steganalysis Based on Co-Occurrences of Integer Wavelet Coefficients

عنوان ژورنال:

اشتراک گذاری